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Neisseria meningitidis causes bacterial meningitis and is therefore 
responsible for considerable morbidity and mortality in both the 
developed and the developing world. Meningococci are oppor- 
tunistic pathogens that colonize the nasopharynges and oro- 
pharynges of asymptomatic carriers. For reasons that arc still 
mostly unknown, they occasionally gain access to the blood, and 
subsequently to the cerebrospinal fluid, to cause septicaemia and 
meningitis. N. meningitidis strains are divided into a number of 
serogroups on the basis of the immunochemistry of their capsular 
polysaccharides; serogroup A strains are responsible for major 
epidemics and pandemics of meningococcal disease, and therefore 
most of the morbidity and mortality associated with this disease. 
Here we have determined the complete genome sequence of a 
serogroup A strain of Neisseria meningitidis, Z2491 (ref. 1). The 
sequence is 2,184,406 base pairs in length, with an overall G+C 
content of 51.8%, and contains 2,121 predicted coding sequences. 
The most notable feature of the genome is the presence of many 
hundreds of repetitive elements, ranging from short repeats, 
positioned cither singly or in large multiple arrays, to insertion 
sequences and gene duplications of one kilobase or more. Many of 
these repeats appear to be involved in genome fluidity and 
antigenic variation in this important human pathogen. 

The genome encodes complete sets of enzymes for glycolysis 
(apart from fruK and pfkA), gluconeogenesis* the pentose-phos- 
phate and Entncr-DoudororT pathways, the pyruvate dehydro- 
genase complex and the trichloroacetic acid cycle. In addition to 
the aerobic respiration genes, there are also both nitrite {aniA) and 
nitrate (NMA1886) reductases; some capability for fermentation is 



also apparent. N. meningitidis appears to be capable of de novo 
synthesis of most of the amino adds (with the exception, of 
asparagine and methionine) and purine and pyrirnidine nudeorides; 
however, the pathways for production of folic add, molybdopierin, ■ 
pantothenate and pyridcodne are incomplete. All of the aminoacy] 
transfer RKA synthetases are present except tRNAAsn. As 
N. meningitidis encodes both ghitaminyl tRNA synthetase and the 
threc-subunit Glu-tRNAGln transamidasc, the latter may._he. 
responsible for the production of tRNAAsn by the transamidation 
of tRNAAsp 1 . 

In addition to the 2,121 predicted coding sequences (CDSs), there 
are 4 copies of a 16S-23S-5S ribosomal RNA operon, 58 tRNAs, 1 
tmRNA (lOSa KNA) and the RNA component of RNAase P. The 
average gene length is 877 base pairs (bp), at a density of 1 per 1.03 
kilobases (kb). The overall coding density is 82.9%, higher than 
Rickettsia*, buL lower than most other sequenced bacteria. Base 1 of 
the sequence was chosen to correspond roughly with 12 o'clock on 
the published map 1 ; the origin of replication (as indicated by the 
bias towards G on the leading strand 4 ) is around 247,600 bp. 
lntriguingly, although the GC bias is similar in both rcplichoresi 
there is a dear bias (60 J.%) of CDSs towards the leading strand in 
only one replichore (anticlockwise in Fig. 1), with no coding bias at 
all on the opposite replichore. The genome contains at least 56 
pseudogenes, of which 17 are remnants of insertion sequence (IS) 
dements, and there arc at least 5 complete or partial prophages 
(pnml-5). 

The G+.C content of the genome is extremdy variable* with at 
least 60 coding regions (nearly 5% in total) having a significantly 
lower G+C content, ranging in sise from 224 bp to 11.3 kb and 
averaging 1.8 kb (see Supplementary Information). These regions 
may represent recendy acquired DNA, and given the natural 
competence of Neisseria! spedes 5 , this is perhaps unsurprising. 
Although most genes in these regions have no database matches, 
or are conserved hypothetical proteins, nearly one-third encode 
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Figure 1 Circular representation of the N, wenlngms72W genome. The concentric 
circles show, reading inwards: the scale In msgabases, with the origin of replication 
indicated; predicted coding sequences clockwise (dark green) and anti-clockwise (light 
green); neisseria! uptake sequences (retfl; dRS3 requences (dark orange); RS elements 
(light orange); dispersed repeats {Correla, ATR, REP2-5; black); IS elements and phage 
(narrow ticks and wide bars respectively; turquoise) and tandem repeats (dark blue). The 
Inner histogram shows plot of (G-CJ/{S+Q with values greater than zero in yellow and less 
than zero in orange, figure generated with LASERGENE software pNAStar), 
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•• . fhat ire likely to be on the surface of the cell* or arc 
SrteJ a« : «L **U characterized CDSs of the serogroup A 

f ™f Lforc-ocposed protein-encoding genes are en* imroedi- 
3T»S5 region, of A+T-riehDNA ^ 

.L« and NMA1083. Eight A+T-rich repo* 
^modification enzymes. Some caution is required in mtcrpret- 
r„7a bii in G+C composition as necessarily ir.dicat.ng recently 
Suired DNAVAe largS region of A+T-rich DNA (417% G+C 
STlUVb) encompasses much of the ribosomal protem operon. 
° Sne of the most rtrikingand 

ecoome is the abundance and diversity of repetitive DNA (Fig. 1). 
Z most obvious example of this abundance » ^ 
uotike sequence, which is involved in the recognition and uptake 
TumZm &e environW . There are nearly two thousand 
copieVofthelO-bpuptakesequence, which either occurs ^ne«m 
inverted repeats as part of a transcriptional terminator . Other 
Stive sequence dements in the N. mmn&dis genome are 

rcoeat arrays arc composed of several drfifeient repeat types (Fig. 2 
Z \tSTi). the most abundant of which arc the 'n^ssena 
irTereerdc mosaic elements" (NIMEs). which comprise repeat 
SoT~50- l5 0 bp (RSdement^each flanked 
repeats (dRS3 elements); there are between 1 and 60 copies of 117 
different families of RS element , 

Also present in the repeat arrays are larger units, wbch areako 
found in isolation, including the 'Correia elements (CB, lM-bp 
sequences bounded by 26-bp inverted repeats) that bavebe« 
previously described in both N. mtmngmd* and Naxena 
pnvtoto*. Almost one-third of the 257 CEs share a cormnon 
51-bp internal deletion, which presumably has not affected the 
ability of the element to be mobilized, and 29 have lost one or both 



NIME (70-200 bp) 
Rsaamem-SO-ieObp 



20 bp 




26 bp 26 bp 



Correia (156 bp) 




IEP3 



of their terminal inverted repeats. Correia elements in tcJ^J 
aresomclim«nanke^^^ 
anddownsrream^.A™^ 

deleted CEs occur embedded within other sequence features, 
Sudin^i RS elements, other CEs and ISs. The pseudogenes 
S^T5»difi«lk» methylasc) and NMA053O a hype. 

% CE insertion. Two repent types arc dispersed over « 
tow frequency, both alone and within -repeat arrays: 19 . copies ot a 
55*^ (30% G+C) repeat (ATR) whose ends fonn an 
^perfect 35-bp inverted repeat and 26 copies of a 12 j-lSO^p 
rei Lt(REP2)-REP2occurs immediately "P^^f^J 
aStt a ribosome^nding^itc-like conserved AAGGA mot* 
within 5-13 bp of the predicted start codon. It seen* , hk^ A* 
sequences within this repeat would exert some eff ea an ^ 
expression of these genes. The repeat arrays appear to be^W 
forthe insertion of the mobile elements in the genome, wtfh ahnotf 
half of the 43 complete and partial copies of IS1016, ISII06 an<J 
IS1655 being integrated in or near to a repeat array. 

Given the natural competence of neissenal spcaes the pr^ence 
of such arrays could encourage sequence variation by actmg as a 
target for a specific recombinase, thereby increasing _ the rate of 
• horizontal gene transfer at the flanking locus. ^* » 
allelic replacement or the creation of new mosaic alleles. A pre 
viously charactered example of enhanced 
exchange involves the rearrangements responsible for Ac ^^ 
tion of pilB expression. PilE is the main structural pihn ^poncnt 
and is expressed from the intact pilE gene. In N. gonorrhoeae, ^cnt 
pilin genes (pflS) exist, which contain only the 3' re^onof the gene 
withoV a stan codon or transcriptional signal^ Coding sequences 
from these silent regions can be recombined inn the expression 
locus, in a r^cA/Q/O-dependent manner, changing the exprcssea 
PilE sequence- 10 . Models put forward for this process suggest thai .the 
™ conversion may £ intergenomic^^. In M 
Z2491 sequence, the pOB gene is surrounded by repeat arrays 
consistin?of NIMEs and a CE (Fig, 3a) Immediately upstrea* of 
the piB gene are eight pilS loci, each without an 5 end and 
intiriately bounded by arrays of NIMEs, suggesting that NIME 
sequences may be involved in pilS/E recombination. The 
N gonorrhoeae Sma/Oa repeat, implicated in enhancing pil&S 
recombination in K gonorrhoeae* exists as a single copy in 
N. meningitidis Z2491, immediately downstream of the pilE gene. 

Our analysis of the occurrence of repeat arrays suggests that 
antigenic variation mediated by gene conversion, similar to that ot 
oilE, may occur widely in the N. meningitidis genome, in JV. 
meningitidis 22491/most shorter repeat arrays are associated with 
genes directly or indirectly related to cell-surface functions (includ- 
ing puins, substrate-binding proteins and transporter compo- 
nents), and the larger repeat arrays are exclusively associated with 
such genes (Fig. 4; and Table 2, Supplementary Information). The 
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i Number ol repeats by type lit N, men//>g/tfctffl Z2491 



ATR(183fap) 



40 bp 



REP2 (134-154 bp) 




Figure 2Typcs of A/. men/^rffereoeai.The name of each repeattype is indicated above 
ihe repeat Invefled repeat sequences are represented by open triangles, 2nd the Internal 
deletion present in some Correia elements by a hatched box. Thetranslational signals 
within R£P2 are Indicated below the repeat 

NATURE|VOL 404|M MARCH 200Qlvrt*w.MTurccom 



Type 

DNA uptake sequence: gccgtdgaa 
RS 

dRS3: aitcccnnnnnrtnngggaai 
Correia OiiQ 

Correia Ontemal deletion) 

Corrwa (partial) 

ATR ' 

REP 2 

REP 3 

REP 4 

REPS 

IS10I6 

\S1106 

IS1655 

Prophage 



size (bp) 

10 
24-161 

20 
150-1 &9 
-104 
37-145 

163 
59-164 

90 

26 

20 
256-740 
203-121S 
1,074-1.257 
2,330-38.9(54 



Frequency 
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19 
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products of these genes arc potential targets for the immune system, 
and antigenic variation of these proteins would be beneficial in 
evading the immune response. The genes encoding the outer 
membrane proteins PorA (NMA1642) and PorB (NMA0398) arc 
flanked on both sides by arrays of repeats, as are IbpAB (ref. 13) 
(Fig. 3b), tbpAB (ret 14) and hpuAB (ref. 15), which encode 
lactoferrin-, transferrin- and haemoglobin-Zhaptoglobin -binding 
proteins, respectively. The putative ferric enterobactin-bindmg 
protein genes fctABl are also associated with two such arrays. In 
contrast, the neisscrial ferric binding protein opcron JbpABC 
(ref. 16), which may not be exposed to the immune response, the 
TonB-dcpcndent receptor NMA0577 and the haemoglobin receptor 
hmbR (ref. 17) are not associated with repeat elements (a poly-(G) 
tract is present in hmbR, potentially allowing phase-variable expres- 
sion; see below). The opaD> opaB, opaA and opcA genes, which are 
involved in various interactions with the host cell, are all flanked on 
one or both sides by repeat arrays. Comparison of these regions 
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from a range of N. meningitidis isolates has shown that the repeat 
arrays themselves are highly variable between diverse Ncissaiac but 
may be identical in related strains even after more than 30 years of 
epidemic spread (G.M., unpublished data). 

Repeat-mediated gene rearrangements arc also apparent at other 
sites, although, in these cases, the repeats are local One such site 
encompasses the mafA and mafB genes; mafA is predicted to be a 
lipoprotein "and mafB to'be secreted; both have been proposed to 
have adhesin activity in N. gonorrhoeae (S. Eickernjaeger cr a/., 
personal communication; EMBL accession number AFH2582). 
The N. meningitidis orthologue of mafB is immediately followed 
by a 6.5-kb region of low G+C DNA encoding mainly short genes of 
unknown function. Interspersed with these genes, and with a 
normal G+C content, arc three repeats of 303 bp and one of 79 
bp from mafB, each corresponding to the beginning of an open 
reading frame (ORF) that has no start codon. This arrangement 
suggests that these repeat sequences may be capable of recombination 
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Figure 3 Structure of selected repeat regions, a, Repeats around the ptIE/S locus, pil 
genes are Indicated by open boxes and repeats by coloured boxes as described in the key. 
b, Large repeal arrays around the IbpA and IbpB genes. Repeats are coloured as in a. c, 
The filamentous haemagglutinin homotogue gene sequences ol N. m$ningftjcfisZ24Q) 
and Z4259. Conserved sequences are Indicated by light blue bars, and internal repeal 
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N.m 24259 

regions by red arrows. The complex repeat structures have been simplified for clarity. 
Open boxes represent genes encoding surface-exposed proteins, and green boxes 
represent genes with no database matches. The brown boxes represent the two halves of 
an ABC transporter pseudogene. 
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Figure 4 Graph showing the relationship between repeat array length and flanking gene associated with surface structures is shown In grey, and the number of genes In ail other 
function, For each category of repeat array length the number of flanking genes categories Is shown hatched. 
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with the mafB gcnc» leading to the attachment of different 3' 
sequences. This interpretation is supported by the observation 
that the 3' sequence of the gonococcal mafB onhologue is almost 
identical to the ORF following the first 303-bp repeat, and not to the 
3' sequence of the meningococcal mafB gene* 
* A second example of repeat-mediated rearrangement of the 3' 
.ends of genes encoding surface-exposed proteins is provided by 
NMA0688, a homologue"' of the Bordetclh pertussis filamentous 
haernagglutiniiv fhaB (ref. 19). The sequence downstream of 
NMA06S8 contains complex repeated sequences from NMA0688 
(Fig: 3c) and encodes several proteins of unknown function. 
Sequencing this region from N. meningitidis 24259 (a serogroup 
C strain; EMBL accession number AJ391284) revealed a similar 
repeat structure, but with additional DNA inserted into the 3' 
sequence of NMA0688 at one of the repeats. This has the effect of 
changing the 3' end of the NMA0688 gene, as well as introducing 
additional genes, some containing similar repeats. The 3' region of 
the Z2491 NMA0688 gene is within a downstream ORF in Z4259. It 
therefore seems likely that, like mafB, the 3' end of this fhaB 
homologue can be altered by recombination with alternative 3' 
ends. Intriguingly, the entire region consisting of the fhaC 
(NMA0687) and fhaB homologues* along with the downstream 
DNA, appears to have been inserted into the coding sequence of an 
ABC transporter pseudogene (NMA0686/NMA0699) ^ ■ 

Tandem repeats of nucleotides in Neisseria are subject to lengthening 
or shortening during replication owing to a phenomenon called 
slipped-strand mispairing 20 . "When these tandem repeats are present 
in coding sequences, this change in length can change the trans- 
lation state of the gene, leading to an on/off switching of the gene 
product called 'phase variation! The compete genome sequence 
reveals around 26 tandem repeats indicating potentially phase- 
variable genes (Table 3, Supplementary Information), many of 
which have been studied before. These repeats range from homo- 
polymeric tracts of G or C nucleotides to di-, tetra- and pent- 
nucleotide repeats. As would be expected, most phase-variable 
genes are either surface-exposed or involved in the biosynthesis or 
modification of surface structures, which supports their suggested 
role in virulence and/or immune avoidance. Four of the tandem 
repeats are not directly within coding sequences, but are immediately 
upstream of candidate variable genes, and these may affect tran- 
scription from gene-specific promoters, as has been shown for opcA 
(ref.2l)and J porA(rcf.22). 

The analysis presented here supports the general operation of 
ihrce mechanisms of repeat-mediated antigenic variation within the 
K memngindis genome: on/off switching and transcriptional 
modulation of gene expression by slipped-strand mispairing of 
short tandem repeats; intragenomic recombination of localized 
repeats leading to the use of different carboxy termini for surface- 
exposed proteins; and intcrgenomic gene conversion of specific 
surface-associated genes associated with large arrays of global 
repeats, mediated by the internalization of related DNA through 
the highly repetitive DNA uptake sequence. □ 

Methods 

DNAwas prepared from N- meningitidis Z2491 Of described 2 *. The DNA was fragmented 
by sonication, cfec-fnaionated on an agarose gel, and two libraries were generated in 
pUClS using sac fractions rwiging from 0.5 to 1 .S kb. Roughly 37,500 pUC done* were 
sequenced from both ends using Dye-terminator chemistry on Afil 373 and 377 
sequencing machine*; 58,269 reads were used to generate the final assembly, giving about a 
10-fold coverage of the genome. Sequence assembly wu accomplished using Phrnp 
(P. Green, unpublished), and the sequencing was finished using CAP4 (ref. 24). The 
assembly Was verified by genomic PCK reactions across all unbridged large repeats, in 
addition to 260 forward and reverse reads from a random library of 20-22 kb cloned in 
lambda FixII (Stratagcnc). and 610 forward and reverse read* from 3 libraries of 9-1 3 kb 
doited in pSP64 {Proniep}. The final assembly was cheeked afcainsl the published map 
uxlng the positions of restriction sites, mapped genes and lambda clones. In the final 
assembly* leas than 0.07% of the genome was covered by a single clone nnly, and leu than 
0.1 2% wax unsequenced on both strands, or with complementary sequencing chemistries. 
The DNA was compared with sequence. 1 : in the EMBL dntnbnsc using BLAJTTN and 



BLASTX*. Transfer RNAs were predicted by iRNA*an-SH (ret 26), Potential CO& were 
predicted axing ORPHEUS 1 * and GUMMKK* (both trained on an initial ORF set 
R cncraicd by ORPHEUS), and (he rcttdo v/ere combined and checked manually. TV 
predicted prolciri sequences were searched against a non-redundant protein database 
using WUBlASTPand PAJflA. The complete six-frame translation was used to search 
PttOSITE, and the predicted proteins were compared against the PFAM database of 
protein domain hidden Mortcov models''. The results of all these analyses were assembled 
together using the Artemis sequence viewer (K*M JU unpublished) and used to inform a 
manual annotation of the sequence and predicted proteins. Annotation was based, 
wherever possible, on characterized proteins or $c«cs. Repeat sequences were identified 
uMtng HMMHR (S. tddy» petiwnal communication) and the EMBOSS program 'profit' 
(http^Awww,Mngerj»c,uk/Softw»re/eMROSS/), tewed on sequence alignments generated 
wiihaustaM*. 
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Signals elicited by binding of the T-ccIl antigen receptor and the 
CD4/CD8 co-rcccptor to major histocompatibility complex 
(MHC) molecules control the generation of CD4 + (helper) or 
CD 8* (cytotoxic) T cells from thymic precursors that initially 
express both co-receptor proteins 1 . These precursors have unique, 
elonally distributed T-cell receptors with unpredictable specificity 
for the self-MHC molecules involved in this differentiation 
process 3 . However, the mature T cells that emerge express only 
the CD4 (MHC class II-binding) or CDS (MHC class I-binding) 
co-receptor that complements the MHC class-specificity of the 
T-cell receptor* How this matching of co-receptor-defincd lineage 
and T-ceU-reccptor specificity is achieved remains unknown u ' 4 , as 
does whether signalling by the T-cell receptors, co-rcceptors and/ 
or general cell-fate regulators such as Notch-1 (xefs 5, 6) con- 
tributes to initial lineage choice, to subsequent differentiation 
processes or to both. Here we show that the CD4 versus CD8 
lineage fate of immature thymocytes is controlled by the co- 
receptor-influenced duration of initial T-cell receptor-dependent 
signalling. Notch-1 does not appear to be essential for this fate 
determination, but it is selectively required for CD8 + T-cell 
maturation after commitment directed by T-ccll receptors. 
This indicates that the signals constraining CD 4 versus CD8 
lineage decisions are distinct from those that support subsequent 
differentiation events such as silencing of co-receptor loci. 

The AND T-ccll receptor (TCR) is specific for a pigeon cyto- 
chrome c peptide bound to the MHC class 11 molecule I-E k (ref. 7). 
Transgenic thymocytes expressing this TCR are efficiently selected 
into the CD4 + lineage in mice expressing wild-type I-A b MHC class 
II molecules. In contrast, AND TCR transgenic mice expressing 
mutant I-A b molecules that are defective in interaction with CD4 
but not the TCR 8 generate CD8 + but not CD4* mature cells 9 . To 
investigate how altering CD4 co-receptor binding controls the 
lineage fate of AND thymocytes, we took advantage of a modified 
two-stage reaggregatc culture system 10 that allows controlled deliv- 



ery of MHC-dcpendent and independent signals to thymocytes at 
distinct stages of maturation (Yasutomo ct al, manuscript in 
preparation). Immature CD69 I °CD4*CD8 + TCR transgenic thymo- 
cytes (double positive; DP) are incubated in dispersed culture with 
cells expressing the desired MHC molecule ligands, to initiate 
selection events (culture 1). This first TCR stimulation docs not 
lead to silencing of the CD4 or CDS locus in dispersed culture for up 
to three days, but it does rapidly induce upregulation of the 
activation marker CD69 that characterizes thymocytes undergoing 
maturation in vrw ,,n . When the CD69 mtdyhi thymocytes arising in 
culture 1 after 20 h are purified and rcaggrcgated with either MHC- 
positive thymic stromal cells (TSC) or MHC-negative TSC in the 
presence of dendritic cells of the appropriate MHC type (culture 2), 
mature functional T cells expressing only a single co-receptor 
develop over the next 60 h. Experiments using this model showed 
that TCR-MHC molecule interactions are required not only in 
culture 1 to generate the CD69 hI thymocytes, but also in culture 2 to 
generate mature T cells. Using this approach, wc asked whether the 
first, the second or both sets of TCR/co-receptor-MHC interaction 
events determined the differentiation of AND TCR transgenic T 
cells along the CD4 versus CD8 pathways. 

Thymocytes from mice expressing only the AND TCR in the 
absence of MHC class 11 molecules (AND TCR transgenic RAG-2"^" 
MHC A(3~'", referred to as AND throughout) were stimulated in 
culture 1 using dendritic cells from wild-type mice (WT-DC) or 
mutant (Mu-DC) I-A b transgenic mice. When purified cells of the 
CD69 + CD4"CDS* phenotypc (CD69DP) generated by this stimu- 
lation were reaggregated with TSC from MHC class IT*" mice* no 
CD4* or CD8* mature T cells developed (fig. la, b). Inclusion of 
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Figure 1 Role of co-receptor in CD4/CD8 lineage choice. Sorted CD69*CD<r CDS* cells 
obtained after stimulation by wild-type dendritic ceils (WT*DC; a t c, e) or mutant dentritic 
cells (Mu-DC; b. d. f) were cultured with MHC class it*" TSC (a, b) or MHCT 7 * TSC and 
wild-type dendrtiic ceils (WT; c, f J. or mutant dentrilic cells (Mu: d, e) for 72 h in thymic 
reaggregates. The recovered cells from all cultures were stained with anli-CD4 or CD8 
monoclonal antibodies and examined lay flow cytometry. The percentage ot cells with a' 
CD4* or CDS* phenoiype is given in the upper left or tower right quadrant ot each panel 
and the absolute number of recovered viable thymocytes Is given below each panel. 
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Abstract 

A —4.8 kb Kpnl fragment, from the upstream region of the methylmalonyl-CoA mutase gene (mu/AB) of rifamycin 
SV-producing Amycolatopsis rnediterranei, was cloned and partially sequenced. Codon preference analysis showed three complete 
ORFs. ORF2 is internal to ORF1. and encodes a polypeptide corresponding to 172 amino acids, whereas ORF1 encodes a 
polypeptide of 421 amino acids. They were identified as the encoding genes of aspartokinase a- and P-subunits by comparing the 
amino acid sequences with those in the database. The downstream ORF3, whose start codon was overlapped with the stop codon 
of both ORF1 and ORF2 by 1 bp, was identified as the aspartate semialdehyde dehydrogenase gene (asd), encoding a polypeptide 
of 346 amino acids. Subclones containing either the ask gene or the asd gene were constructed, in which the genes could be 
expressed under Lac promoters. Two subclones could transform R coli CGSC 5074 (ask-) and R cali X6118 (asd-) to prototrophy, 
supporting the functional assignments. Southern hybridisation indicated that the —4.8 kb sequenced region represented a 
continuous segment in the A. rnediterranei chromosome. It is concluded that ask and asd genes are present in an operon in A, 
rnediterranei and therefore that organisation of these two genes is the same as in most gram-positive bacteria, such as Mycobacteria, 
Corynebacterium ghttamicum and Bacillus subtiUs, but is different from Streptomyces akiyoshiensis. © 1999 Elsevier Science B.V. 
All rights reserved. 

Keywords: Amycolatopsis rnediterranei'. Aspartate semialdehyde dehydrogenase; Aspartakinase; Operon 



L Introduction 

In most bacteria, amino acids of the aspartate family, 
such as lysine, methionine and threonine, are produced 
from aspartate by a series of enzymatic steps. Bacterial 
biosynthesis of the above amino acids differs from most 
other pathways of amino acid biosynthesis in that one 
of the intermediates, diaminopimelate (DAP), has an 
important metabolic function in its own right as a 
constituent of the bacterial cell wairpeptidoglycan. To 
ensure that the amino acids produced do not interfere 



Abbreviations: eta, cimin6 acid; bp, base pain;: kb, kilobase pairs; 
kDa, kilodatton: molecular weight* OR.F, open rending frame; 
RBS, ribosomobindlng site; DAP, diaminopimelate. 

^The nucleotide sequence data reported in this paper hns been 
submitted to Gcncbanlc under the accession number AF134837. 

* Corresponding author. Present iiddrcss: 1SBDD, Suite 212b. 800 
Eiurt Leight SL. Virginia Biotechnology Park, Richmond, Va 23220, 
USA. 

E*inail address: w2hang@hsc.vcu.edu ( W, Zhang) 



with cell wall biosynthesis, a complex regulatory system 
was installed in this pathway. Collectively, these path- 
ways constitute the so-called aspartate pathway, a 
branched pathway whose common steps are regulated 
to allow the balanced synthesis of the various end 
products. Several distinct genomic organisations and a 
diversity of regulatory mechanisms controlling the meta- 
bolic flux through this multtbranched pathway have 
been identified in bacteria. 

The enzymes catalyzing the first two steps of 
this core pathway, aspartokinase (Ask, EC2.7.2.4) 
and aspartate semialdehyde dehydrogenase (Asd, 
EC1.2.LH), are well conserved functionally. In 
Escherichia coli there are three isozymes of asparto- 
kinase, regulated by lysine, methionine and threonine, 
respectively (Theze et al., 1974). In Bacillus, the genes 
encoding three isozymes of Ask, regulated by diaminopi- 
melic acid (DAP), lysine, or lysine and methionine, have 
been cloned and sequenced (Chen et al.. 1989). Only 
one Ask has been found in Pseudomonas, Brevtbacteria, 
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Mycobacteria, Thcrmus and Corynebactcria (Shiio and 
Miyajima, 1969; Cohen et aL, 1969; Cirillo et aL, 1994; 
Cremer et aL, 1988; Nishiyaina et al., 1995; Jetten 
et al., 1995). 

The asd gene has been cloned from many microorga- 
nisms, such as R coli (Haziza et aL, 1982), Haemophilus 
influenzae* Salmonella typhimuruim (Galan et aL, 1990), 
Streptococcus mutans (Jaguztyn-Krynicka et aL, 1982; 
Cardineau and Curtiss, 1987), Corynebacterium glur 
tamicum (Kalinowski eta]., 1990, 1991), Saccharomyces 
cerevisiae (Thomas and Surdin Kerjan, 1989), 
Mycobacteria (Cirillo et al., 1994), Leptospira interro- 
gans (Baril et aL, 1992) and Pseudomonas aeruginosa 
(Hoang et al., 1997). The £. coli asd gene appears to 
be regulated principally by lysine and secondarily by 
methionine and threonine (Haziza et aL, 1982), The ask 
and asd genes form an operon in gram-positive 
Mycobacteria, Bacillus and Corynebacteria. The first 
asd gene to be isolated from an antibiotic-producing 
Streptomyces (Streptomyces akiyoshiensis) was 
sequenced recently. The deduced amino acid sequences 
of the ORFs adjacent to asd showed no similarity to 
any ask gene, therefore it seems that streptomyces differ 
from other gram-positive bacteria in the organisation of 
ask and asd (Le et aL, 1996). 

In streptomycetes, diaminopimelate (DAP) from the 
aspartate pathway is an important constituent for their 
cell wall (Zakharova et al., 1980). Aspartate could be a 
precursor for both protein amino acids and some second- 
ary metabolites, and it has been shown that regulatory 
mechanisms acting on aspartate pathways control the 
flow of metabolic intermediates to some antibiotics 
(Vining et al., 1990). To date, no aspartokinase gene 
has been cloned from actinomycetes, and knowledge of 
the gene structure and regulatory mechanism of the 
aspartate pathway in acttnomycetes is limited. To better 
understand the aspartate pathway in an antibiotic-pro- 
ducing actinomycete, and its relationship with secondary 
metabolism, we cloned and sequenced the ask and asd 
operon from rifamycin SV-producing Amycolatopsis 



mediterranei, and expressed them heterologously in £ 
colL 



2. Results and discussion 

2. 1 . Cloning and sequencing of a ~4.8kb segment from 
the upstream region of A. mediterranei mutAB 

We have already determined the nucleotide sequence 
of —7.8 kb Kpnl segment of pCZ8, which contained the 
methylmalonyl-CoA mutase (mjtfAB) genes and a novel 
kinase gene (Zhang et aL, in preparation). In the present 
study, a —4.8 kb Kpnl fragment, which is — 2kb 
upstream of the above -7.8 kb Kpnl fragment, was 
cloned into pUC18, generating pCZ6. -3 kb Kpnl/Pstl 
fragment of pCZ6 was sequenced. Three complete ORFs 
were found in this fragment (Fig. 1). ORF1 starts at nt 
220-222 (GTO), which was preceded by a putative 
streptomycete-like RBS (GGAG) at nt 208-211, and 
terminates at TGA (nt 1483-1485). By computer-aided 
analysis, this ORF was found to encode a protein of 
421 amino acids (il/ r =44.4 kDa), with significant 
sequence similarity to aspartokinases from bacteria 
(Table I), Aspartokinase from most gram-positive bac- 
teria are encoded by two in-phase overlapping genes. 
The high degree of similarity of the A. mediterranei 
aspartokinase (ORF1) with those of Mycobacteria, B. 
sub tilis and C glutamicum led us to expect a similar 
gene organisation, indeed, a second ORF (ORF2) was 
found within ORFL ORF2 starts at nt 967-969 (GTG), 
and ends at nt 1483-1485, is preceded by a putative 
streptomycete-like RBS (GGAG at nt 957-960), and 
encodes a putative polypeptide of 172 amino acids 
(A/ f = 18.2 kDa). Both ORF1 and ORF2 use the same 
stop codon. We therefore tentatively name the two 
ORFs ask A and askB for the large (a) and small (p) 
subunits of aspartokinase, respectively. The mean 
G+C% contents in ORF1 and ORF2 are 68.0 1& and 
66.6%, respectively, which are typical values for actino- 
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Table 1 



Identities and similarities among so me aspartokinases and aspartate semialdchydc dehydrogenases from prokaryotic sources 

Amycolatopsis mediterranei similarity (identity) (%)' 



Aspartokinase 



Aspartate scmiaidehyde dehydrogenase 



Mycobactcriuam smegmatic 
M. tuberculosis 
Coryttebacterium glutamicutn 
C JIavum 

fscudomoitas aeruginosa 
$acillus subtilis 
Aquifex aeollcus 
Thermusflavus 
T. thetmophUus 
B. itearothermophilus 
Kfetkanococcus jannaschll 
£ coll 

Archaeoglobus fulgidus 
Streptomyces alciyoshiensis 
Actinobacillus pleuropneumonias 
Streptococcus /nutans 
Vibrio cholerae 
Campylobacter jejuni 
Shewanella sp. 
Bordetetfa pertussis 
Burkholderia pseudomallet 
• Haemophilus influenzae 
Leptospira interrogans 
Salmonella typliimurium 
Methanobactcrlum tlwrmaautotrophicuin 



88.12(74.11) 
79.17 (62.99) 
85.27(73.16) 
85.75 (73.40) 
69.00 (46.00) 
65-11 (44.47) 
65.00 (44.00) 
70.12 (49.38) 
67.00 (46.00) 
68.73 (47.40) 
59.00 (34.00) 
54.06 (28.43) fc 
52.00 (30.00) 



80.70 (68.13) 
81.34 (71.72) 
76.04 (63.32) 
76.33 (63.61) 
48.82 (29.29) 
62.98 (46.15) 
66.00 (51.00) 



50.00 (35.00) 
45.40 (26.71) 
47.00 (34.00) 
81.25 (72.62) 
62.54 (46.53) 
61.40 (42.25) 
61.42 (45.66) 
64.20 (45,68) 
64.35 (46.52) 
47.61 (29.34) 
50.00 ( 32.14) 
47,15 (27,63) 
52.81 (33.00) 
48.67 (28.78) 
46.00 (31.00) 



* Values were derived with the GAP program from the University of Wisconsin Genetics Computer Group Package, with a gap weight of 3 and 
a gap length of 0.1. Vibrio cholerae (U25082), Shewanella sp. (D49S39) (Raio et aU 1997), Campylobacter jejuni (X97964) (Pawelec and Jagusztyn- 
Krynicka, 1996), Leptospira Interrogans (S92223)* Boraetetta pertussis (X75813), Actinobacillus pteuropneumonlae (U5I440). 7*. tltermoptiilus 
(AB013131), Pseudomonas aeruginosa ask (AF061757), Afcifutrwcoccus jannaschii (Q57991) (Bult et aL, 1996), Arduieoglobus Julgidus <&k 
(AEOOl056)andaw/(AE000998)(Klenk ctal., 1997), Aquifex aeollcus (AE000726) (Dcckcrt ct aL, 1998), Metlianobucterium themoautotropklcum 
(AE000858) (Smith et aU 1997). 

b Lysine-sensitivc aspartokinasc H( of £ coli (U00096) (filattner ct at, 1997), 



mycete ORFs. Tlie upstream regions of the two ORFs 
showed no sequences corresponding to £ coli o~ 70 -Iike 
-10 and -35 hexamers (Strohl, 1992). 

In the gram-positive bacteria Mycobacteria, Bacillus 
and Corynebacteria, askA and askB are organised in 
one operon with aspartate semialdehyde dehydrogenase 
(asd), but this is not the case in S. alciyoshiensis. The 
askAB genes fr6m A. mediterranei have high similarity 
to other cloned ask genes, so wc also expected to identify 
the asd gene nearby, the Blast analysis showed that 
downstream ORF3, which overlapped by 1 bp with the 
askAB genes, has 45-71% identity with asd genes from 
gram-positive bacteria, and 28-33% identity with those 
from gram-negative bacteria. ORF3 starts at nt 1485— 
1487 (ATG), is preceded by a putative RBS (GGAG) 
at nt 1472-1475, ends at nt 2523-2525, has a G + C% 
- of 70.34%, and encodes a single polypeptide of 36.4 kDa. 
According to the Blast results, we tentatively identify 
ORF3 as the asd gene. The askAB and asd genes possess 
overlapping stop and start at nt 1485, a device which is 
thought to lead to translational coupling and which is 
also Found in A. mediterranei mutAB genes, but not seen 



in the Mycobacteria, Bacillus and Corynebacteria 
askAB-asd operons. The order of askAB and asd gene 
organisation in A. mediterranei is the same as 
Mycobacteria and Corynebacteria, but is different from 
that in Bacillus, in which the order of the askAB and 
asd genes is reversed (KaHnowski et aU 1990). 

To prove the ask and asd genes are located the same 
way in the chromosome. A, mediterranei genomic DNA 
and pCZ6 were extracted and digested completely with 
Kpnl and probed by Southern hybridisation with a 
P'Mabellcd Kpn\ 4.8 kb insert of pCZ6. A clear 4.8 kb 
band could be identified in the genomic DNA lane, 
indicating that the cloned 4,8 kb Kpnl fragment repre- 
sented the true sequence from A, mediterranei (data, not 
shown). These observations suggested that the askAB 
and asd genes might be co-transcribed in one mRNA as 
an operon. The hybridisation results also suggest that 
there is only one ask and one asd gene in A. mediterranei. 
Further study is on the way to determine the ask-asd 
promoter. 

A perfect inverted repeated sequence 12 bp in length 
capable of forming a stable stem-loop structure, which 
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CQ MALWQKYGC 

CC MALWQKYGG 

Ms MALWQKYCC 

Am VALWQKYGG 

Kt KALWQKYGG 

BS MGLIVQKFGG 

8C MGIIVQKFCC 

Tf MALWQKYGG 



SSLESAERIR 
SSLESAERIR 
SSVADAERIR 
SSLESADRIK 
SSVADAERIR 
TSVGSVEKIQ 
TSVCSIERIQ 
TSVGDLERIH 



NVAERIVATK 
NVAERIVATK 
RVAERIVETK 
RVASRIVATK 
RVAERIVATK 
NAANRAIAEX 
HVANRVIEEV 
KVAQRIAHYR 



KAGNOWWV 
KAGNDWWC 
KACNDVWW 
KAGNDWWC 
KQGNDWVW 
QKGHQVWW 
QKCNDVWW 
EKGHRLAWV 



SANGDTTDEL 
SAMGDTTDEL 
SAMGDTTDDL 
SAMGDTTDEL 
SAMGDTTDDL 
SAMCKSTDEL 
SAMGKTTDEL 
SAMGHTTDEL 



LELAAAVNPV 
LELAAAVNPV 
LDLARQVSPA 
LDLAQQVNPA 
LDLAQQVCPA 
VSLAKAISDQ 
VNLAKQISNH 
IALAKRVNPR 



70 

PPAREMDMLL 
PPAREMDMLL 
PPPREMDMLL 
PPEREMDMLL 
PPPRELDMLL 
PSKREMDMLL 
PSKREMDMLL 
PPPRELDLLT 



71 "0 

eg TAGERISNAL VAMAIESLGA EAQSFTCSQA GVLTTERHGN ARIVDVTPGR VREALDEGKI CIVACFQGVN 

Ct* TAGERISNAL VAMAIESLGA EAQSFTCSQA GVLTTERHGN ARlVDVTPGR VREALDEGKI ClVAGFQGVtf 

Ms TAGERISNAL VAMAIESLGA QARSFTGSQA GVITTGTHGN AKIIDVTPGR LRDALDEOQI VLVACPQCVS 

Am TAGERISNSL VAMAIAAQGA EAWSFTGSQA GWTTSVHGN ARlIDVTPSR VTEALDCGY1 ALVAGFQGVA 

Mt TAGERISNAL VAMAIESLGA HARSFTCSQA GVITTGTHGN AKIIDVTPGR LQTALEEGRV VLVAGFQGVS 

B3 ATGEQVTISL LSMALQEKGY DAVSYTGWQA GIRTEAIHCN ARITDIDTSV LADQLEKGK1 VIVAGFQGMT 

BC STGBQVSIAL LAMSLHEKGY KAVSLTGWQA GITTEEMHGN ARIMNIDTTR IRRCLDEGAI VIVAGFQGVT 

Tf TTG2QVSVAL LSMQLWAM6I PAKGFVQHQl GITTDGRYGD ARILBVNPAR I REAL DOG PV AVXAGPMCTT 



141 

Cg KfiTRDVTTLG 

Cf KETRDVTTLG 

Ms QDSKDVTTLG 

Am QDTKDITTLG 

Mt QDTKDVTTLG 

33 EDC . EITTLG 

Bt ETG. EITTLG 

Tf PEC. EITTLG 



RGGSDTTAVA 
RGGSDTTAVA 
RGGSDTTAVA 
RGGSDTTAVA 
RGGSDTTAVA 
RGGSDTTAVA 
RCCSDTTAVA 
RGGSDTTAVA 



LAAALNADVC 
LAAALNADVC 
VAAALDADVC 
LAAALNADVC 
MAAALGADVC 
LAAALKVDKC 
LAAALKAEKC 
IAAALGAKEC 



EIYSDVDGVY 
EIYSDVDGVY 
EIYTDVDGIP 
EIYSDVDGVY 
EIYTDVDGIF 
DIYTDVPCVF 
DlYTDVTGVF 
EIYTDTEGVY 



TAD PR1 VPN A 
TADPRIVPNA 
TAD PR I VPN A 
TADPRWPDA 
SADPRIVRKA 
TTDPRWKSA 
TTOPRYVKTA 
TTDPHLIPEA 



QKLEKLSFEE 
QKLEKLSFEE 
RKLDTVSFKfi 
KKLDTVTYEE 
RKLDTVTFEE 
RKLEGISYDE 
RKIKEISYDE 
RKLSVIGYDQ 



210 

MLELAAVGSX 
MLELAAVGSK 
MLEMAACGAK 
MLELAASGSK 
tfLEMAACGAK 
MLELANLGAG 
KLELANLGAG 
MLEMAALGAR 



211 280 

Cg ILVLRSVEYA RAPNVPLRVR SSYSNDPGTL IAGSMEDIPV E . EAVLTGVA TDKSEAKVTV LGISDKPGEA 

Cf ILVLRSVEYA RAPNVPLRVR SSYSNDPGTL IAGSMEDIPV E. EAVLTGVA TDKSEAKVTV LGISDKPGEA 

MS VLMLRCVEYA RRYNVPIHVR SSYSOKPCTI VKGSIEDIPM E.DAILTGVA HDRSEAXVTV VGLPDVPGYA 

Am ILHLRSVEYA RRYCVPIRVR SSYSDKPGTT VTGSIEEIPV E.OALITGVA HDRSEAKITV TGVPDHTGAA 

Mt VLMLRCVEYA RRHN1PVHVR SSYSDRPGTV WGSIKDVPM E.DPILTGVA KDRSEAKVTI VGLPDIPGYA 

BS VLHPRAVEFA KNYQVPLEVR SSTETEAGTL IB. . - EESSM EQNLIVRGIA FEDQITRVTI YGLTSGLTTL 

Bt VLHPRAVEPA KNYEVPLEVR SSHENERGTM VK. . . EEVSM EQHLIVRGIA FEDQVTRVTV VGIEKYLQSV 

Tf VLHPRAVYYA KRYGWLHVR SSFSYNPGTL VK EVAM EMDKAVTGVA LDLDHAQIGL IGIPDQPGIA 



281 

Cg AKVFRALA,. 

Cf AKVFRALA. . 

MS AKVFRAVA. . 

Am ARIPRVIA., 

Mt AKVFRaVARR 

Bs STIFTTLA.. 

Bt AT I FT ALA. . 

Tf AKVFQALA. . 



. . DAEINIDM 
, . DAEINIDM 
. -EADVNIDM 
. . DAEIDIDM 
RKQHRHGAAE 
. . KRNINVDI 
. . NRGlNVDI 
. . ERGIAVDM 



VLQNVSSVED 
VLQNVSSVED 
VLQNI SKIED 
VLQNVSSTVS 
RLQGRGRQDR 
IIQTQAEDKT 
ITQNATNSET 
IIOGVPGHDP 



GTTDITFTCP 
GTTDITFTCP 
GKTDITFTCA 
GRTDITFTLS 
HHLHLL . . .P 
G. . .ISFSVX 
AS. .VSFSIR 
SRQQMAFTVK 



RSDGRRAMEI 
RADCRRAMEI 
RDNGPRAVEK 
KANGAJCAVKE 
QTSGPPPWKN 
TEDADQTVAV 
TEDLPETLQV 
KDFAQEALEA 



LKKLQVQGNW 
LKKLQVQGNW 
LSALKSEICF 
LEKVQAEIGF 
WTRSETRSAS 
LEEYKDALEF 
L . . . . QALEG 
LEPVLAEIGG 



350 

TNVLYDDQVG 
TNVLYDDQVD 
SQVLYDDHIG 
ESVLYDDHVG 
TQLLYDDHIG 
EJCIETESKLA 
ADVHYESGLA 
EAI.LRPDIA 



351 

Cg KVSLVGAGMK 

Cf KVSLVGAGMK 

Ms KVSLIGAGMR 

Am KVSWGAGMR 

Mt KVSLIGAGMR 

BS KVSIVGSGMV 

Bt KVSIVGSGMI 

Tf KVSIVCVGLA 



SHPGVTAEFM 
SHPGVTAEFM 
SHPGVTATFC 
SHPGVTATFC 
SHPGVTATFC 
SNPCVAAEMF 
SNPGVAARVF 

Stpevpakmf 



EALRDVNVNI 
EALRDVNVNI 
EALABAGINI 
EALAEAGVNI 
EALAAVGVNI 
AVLAQKNILI 
EVLADQCIEI 
QAVASTCANT 



ELISTSEIRI 
ELISTSEIRI 
DLISTSEIRI 
EUNTSEIRI 
EL I ST SE DOR 
KMVSTSEIKV 
KMVSISEIKI 
BMIATSEVRI 



SVLIREDDLD 
SVLIREDDLD 
SVLIKDTELD 
SVLIRDAQLD 

SRCC 

STWSENDMV 
STVIDEKYMV 
SVIIPAEYAE 



AAARALKEQF 
AAARALHEQF 
KAVSALKEAF 
DAVRAIHEAF 
AATPNWTRPW 
KAVESLHDAF 
SAVEELHEAF 
AALRAVHQAF 



420 

qlggedeaw 
qlggedeaw 
glggddeaw 
elcgdeeaw 
srcmkrsgsa 
elskhpsav. 
glaeeaaavr 

ELDKA 



421 438 

* Cg YAGTGRd 

Cf YACTCR 

MS YAGTGR 

Atn YAGSGR 

Mt ATRRPRCTRG RDGRWACQ 

BS 

Bt S 

Tf 

Fig. 2. Multiple alignment of A. medUemnei Ask and some Asks from other sources. The alignment was generated using GCG Pileup. The positions 
conserved in ,iU Ask proteins are marked by an asterisk. Ms: Mycobacterium sinegmatis (Z 1 7372), Mt; M. tuberculosis (U90239), Cf: 
Coryncbacurhmi fiamm (U6&4$). Cg: C glutumicitm (X57226). Tf: Thcmtus J%jviu(D3792$), Bt: Bacillus siearotlwrmophifas (L4635H (Cantooi 
ct al., 1996), 8c: B. sub/ids (J03294), Am: A. Mediterrunci (AFW837). 
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St MVKDAPQDTG AHTQHISLQS KNAMKNVGFI GWRGMVGSVL MQRHVEERDF 

Ec HKNVGFI GWRGMVGSVL MQRHVEERDF 

Hi MKNVGFI GWRGMVGSVL MDRMSQENDF 

Pa [\.\'_[ MKRVGLI GWRGMVGSVL IQRMLEERDF 

cf MTTIAW GATGQVGQVM RT.LLEERNF 

c - [ MTTIAW GATGQVGQVM RT . LLEERNF 

M s . ' . . .tfVNIGW GATGQGRQVM RN.LLEQRNF 

Mc MGL5IGIV GATGQVGQVM RT . LLDERDF 

Am ]..'//.'/. MADGLRVCW CATCQVGAVM RK . LLAEREF 

Sm » MGYTVAIV GATGAVGTRrt IQ . QLEQSTL 

Bs KGRGLKVAW GATGAVGQQM LK.TLEDRNF 

Sa MRVGIV GATGQVGTVM RR ILTERUF 



DAIRPVFFST 
DAIRPVFFST 
EMLMPVFFTT 
DLIEPVFFTT 
PADTVRFFAS 
PADTVRFFAS 
PATSVRFFAS 
PASAVRFFAS 
PIABLRYFAS 
PVOKVRLLSS 
EMDTLTLLSS 
PVTELRLFAS 



SQFGQAAPTP 
SQLGQAAPSF 
SQAGQKAPVF 
SHVGAQAPE . 
PRSAGRKIBF 
PRSAGRKlEF 
PRSEGtOaTF 
ARSQGRKLAF 
ARSAGSKLPW 
SRSAGKVLQY 
KRAAGTKVTF 
ARSAGTEL. . 



GDTSTGTLQD 
GGT.TGTLQD 
GCKDAGDLKS 
VDKDIAPLKD 
RGTEIEVEDI 
RGTEIEVEDI 
7RGQEIEVEN. 
RGQBIEVED. 
RDTEITIED. 
KDQDVTVSL. 
KGQELTVQE. 
. . .DGVTVED 



90 

AF.DLDALKA 
AF . DLEALKA 
AF.DIEELKK 
AY.SIDELKT 
TQATEESLKG 
TQATEESLKD 
. . AETADPSG 
, .AETADPSG 
. .ASTADPSG 
. . TTKDSFEA 
. . ASPESFEG • 
. . AATADYTG 



91 

St LDIIVTCQGG 

EC LDIIVTCQGG 

Hi LDIIVTCQGG 

Pa LDVILTCQGG 

Cf IDVALFSAGG 

Cg IDVALFSAGG 

MS LD I ALPS AG A 

Mc LDX ALPS ACS 

Am LDIALFSAGG 

Sra VDIALFSAGG 

Bs VNIALFSAGG 

Sa LDIVLFSAGG 



181 

St VSVATYQAAS 

EC VSVATYQAAS 

Hi 1SVATYQAAS 

Pa MSAKTYQAAS 

Cf LHVSSYQAVS 

Cg LHVSSYQAVS 

Ma LIVSSYQAVS 

ML LWSSYQAVS 

Am LVASTYQAVS 

Sm VTVSTYQAVS 

BS VIVSTYQAVS 

Sa LWATYQAVS 



DYTNEIYPKL 
DYTNE1YPKL 
DYTWEVYPKL 
DYTSEVPPKL 
TASKQYAFLF 
TASKQYAPLF 
TMSRVQAPRF 
AMSKVQAPRF 
STSKAQAPRF 
SVSAXFAPYA 
TVSQALAPEA 
ATSKALAEKV 



GGGARHKREL 
CGGARHMREL 
GACAKNMR2L 
GACAQNMRDL 
6SGLAGVETL 
GSGLAGVETL 
GSGIAGVEEL 
GSGLAGVAEL 
GSGLAGVOEL 
GAGQSAlPJET 
GAGNEAVKEL 
GSGLAGVAEL 



RESCWQGYWI 
RESGWQGYWI 
KATGWDGYWV 
RBAGWQCYWI 
AAAG, .ATW 
AAAG. - ATW 
AEAG. .V1W 
AAAG. .VTVI 
AAAG. -VTVI 
VKAC..AWV 
VKRG. .AlVI 
ASQG. .AW1 



DAASTLRMKD 
DAASSLRMKD 
DAASALKttfCD 
DAASSLRM2D 
DNSSAWRKDO 
DNSSAWRKDO 
DNSSAFRKDP 
DNSSAWRKDP 
DNSSAFRKDP 
DNTSHFRQNP 
DNTSAFRMDE 
DNSSAWRKHP 



DAIIILDPVN 
DAIIILDPVN 
DAIIVLDPW 
DAVIVLDFVN 
EVPLIVSEVN 
EVPLIVSEVN 
DVPLWSEVN 
DVPLWSEVN 
DVPLWSEVN 
DVPLWPEVN 
NTPLWPEVN 
EVPLWSEVM 



QDVITDCLNN 
QDVITDGLNN 
QHUTSEGLKK 
RKVIDQALDA 
PS. . .DKDSL 
PS, - - DKDSL 
FDRDVRGKKL 
FERD. . AHRR 
PEAVKEAR.. 
AYAHDAHN. . 
EADLHEHN. . 
PH. . .AIKDR 



CVKTPVCGNC 
GIRTFVGGNC 
GIKTFVGGNC 
CTRNYICCNC 
VKGIIANPNC 
VKGIIANPNC 
AKGIIANPNC 
PKGIXANPNC 
KCIIANPNC 
. .GIlACFNC 
. .GIIANPNC 
PKGIIANPNC 



TVSLMLMSLG 
TVSLMLHSLG 
TVSLMLMAIG 
TVSLMLMALC 
TTMAAMPVLK 
TTMAAMPVLK 
TTMAAMPVLK 
TTMAAMPVLK 
TTIAAMPVLK 
STIQMMVALE 
STIQMVAALB 
TTMAAMPVLR 



LTQMGQLYGH 
LTQMGHLYGH 
LSQMGLLEQA 
LECQMGAAKAS 
AKQVAAVGDH 
AKQVAAVGDH 
AGQARPVIDG 
AEQARAVIGG 
ACQVRAAAEK 
VREIKEWND 
YSQTQAILNK 
HGQTQKWAD 



VADELATPSS 
VADELATPSS 
VSSELKDPAS 
VADDLANPAS 
NVEFVHDCQA 
NVEFVHDGQA 
VEQLVHCCSA 
AEQLVYOGGA 
ASLLTHDGAA 
GVDPKAVHAD 

S EIEPE 

AEKLTHDGEA 



AILDIERKVT ALTRSGELPV 
AILDIERKVT TLTRSCELPV 
SILDIERKVT AKMRADNFPT 
AILDIDRKVA ETLRSEAFPT 

ADAG DV 

ADAC DV 

LQYP AP 

LEFP PP 

IDFP KP 

TFPS GG 

IMPV KG 

VDFP EP 



DHFGVPLAGS 
DNFGVPLACS 
DNFGAALGGS 
EHFGAPLGGS 
GPYVSPIAYN 
GPYVSPIAYN 
NKYVAPIAFN 
WTYVAPIAFN 
EKYVRPIAFN 
DKKHYPIAFK 
DKKHYQIAFN 
GVYKRPIAFN 



LIPWIDKQL. 
LIPWIDKQL. 
LXPWIDKLLP 
LIPWIDKELS 
VLPFAGNLVD 
VLPFACNLVD 
IVPLACNYVD 
WP LAGS LVD 
VLPMAGSrVD 
ALAQIDVFTD 

AIPQID 

VLPLAGSIVD 



ISO 
GLFAHNLVDW 
CLFANDLVDW 
GLFF-KDLVEW 
GLFOAGLVEW 
PLHDAAGLVK 
PLHDAAGLVK 
PLHEEAGLQR 
VLHDEARLVR 
PLKDSAGLVR 
PIRQKWGLSR 
PIRKAYGLKK 
PLHDEAGLEA 



270 
DNG.QSREEW 
DNG.QSREEW 
ETG.QTKBEW 
QRR.QSREEW 
DGTFETDEEQ 
DGTFETDESQ 
DGSGETDEDQ 
DGSGETDEDQ 
DGEFETDEEK 
NDY- .TYEEM 

DGLNETDEEQ 
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St KGQAETNKIL 

Ec KGQAETNKIL 

Hi KGYAETNKIL 

Pa KAQAETNKIL 

Qi KLRNESRKIL 

Cg KLRNESP-KIL 

Ms KLRNESRKIL 

Mt KLRFESRKIL 

Am KFRNESRKXL 

Sm KMTNETKKIM 

Bs 

Sa KLRNESRKIL 



NTA.SVIPVD 
KTS,SVIPVD 
GLSDNPIPVD 
ARFKNPIPVD 
CLPD. .LKVS 
GLPD. .LKVS 
GIPE..LLVS 
GIPD. ,LLVS 
SIPG. . LAVS 
EEPE. .LPVS 



GLCVRVGALR 
CLCVRVGALR 
GLCVRIGALR 
GICVRVGAMR 
CTCVRVPVFT 
GTCVRVPVFT 
CTCVRVPVPS 
GTCVRVPVFT 
CTCVRVPVPS 
AHCVRVPILF 



CHSQAFTIKL 
CHSQAPTIKL 
CHSQAFTIKL 
CHSQALTIKL 
GHTLTIHAEF 
CHTLTIHAEF 
GHSLSINAEF 
GHSLSINAEF 
GHSVSVMAEF 
SHSEAVYIET 



SOCEVSIPTVE 
KKDVSIPTVB 
KKDLPLEEIB 
NKDVPLTDIE 
DKAITVEQAQ 
DKAITVEQAQ 
SQP1SVERTK 
AQPLSPERAR 
ERPLSVERAT 
KDVAPIEEVK 



ELLAAHNPWA 
ELLAAHMPUA 
QIIASHNEWV 
GLIRQHNPWV 
EILCAASC.V 
EILGAASC.V 
ELLSAAAG, V 
ELLDGATG.V 
ELLTHAPG.V 
AAIAAFPG-A 



KWPNDRDIT 
KWPNDRBIT 
KVIPNDKEIT 
KLVPNHREVS 

ELVD V 

KLVD V 

KLVD V 

QLVD V 

ELSE E 

VLEDDIKHQI 



MRELTPAAVT* 
MRELTPAAVT 
LRELTPAKVT 
VRELTPAAVT 
PTFLAAAGID 
PTPLAAAGID 
PTPLAAAGID 
PTPLAAACVD 
PTPLQAACMD 
YPQAAMAVGS 



360 
GTLTTPVGRL 
GTLTTPVGRL 
GTLSVPVGRL 
CTLSVPVGRL 
ESL...VCRI 
ESL . . .VGRI 
DCL . . .VGRI 
ESL, , .VGRI 
PSY.-.VCRI 
RTF... VGRI 



ElPG. .LKVS GTCVRVPVFS GHSLQINARF ARP ISADG AT ELLKDAPG.V ELSD I PTPLQAAGKD PSY . 



.VGRI 
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LSAFTVGDQ LLWGAAEPLR RMLRQLA 

.LSAFTVCDQ LLWGAAEPLR RMLRQLA 

. LAAFTVGDQ LLWGAAEPVR RILKQLVA 

.LGAPTVGDQ LLWGAAEPLR RMLRILLER 

GLVLWSGON LRKCAALNTI QIAELLVK 

6LVLWSGDN LRKGAALNTI QIAELLVK 

GLALFVSGDN LRKGAALNTI QIASLLAADL 

GLALFVSGDM LRKGAALMTI QIASLLTADL 

GLALFLSMDS LRKGAALNAl QIAELVAQQL 

GIHMWWSDN LLKGAAWNSI ITANRLKERG LVRSTSELKF ELK 



361 

St RKLNMGPEF. 

Ec RKLNMCPEF. 

Hi RKLAMGPEY. 

Pa RKLNKVSQY. 

Cf RQDSTVDDNR 

Cg RQDSTVDDNR 

Ma RQDPGVFDGR 

Mt RRDPGVPDGR 

Am RVDPGVEGGR 

Sm RKDLDIEN. . 

BS 

Six RSDETVDN. . 



GLALfVSNDN LRKGAALNAV QIAELVA 



Fig. 3. Multiple alifinmeut of A. mcdit err unci Asd und some Mds from other sources. The alignment wan geacrutcd using GCG Pilcup. The positions 
conserved in all proteins arc marked by an asterisk. The positions conserved only in Asd proteins from gram-positive bacteria arc marked +. Ms: 
M. smegmutis (2173*72), Me M. tuberculosis (U90239), Sa: Strcptowyccs (ikiyoshicnsis (U29446), Cf: Carynehttctcrhon flavwn (L16848). Cg: C. 
glutamicum (X57226). Ec; E. coli (V00~i62). Pa: Fseudomoiuts aeruginosa (UU055), St; Salmonella typhlmurlum (AF015781). Bs: & suhtilis 
(Z225S4), Sm: S. mutwu (J02667), HI- Haemophilia influenzae (1/32747), Am: A. mcditerraiiei (AFU4S37). 
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IR probably serves as a p-independent transcriptional 
terminator, was found 13 bp downstream of the asd 
gene. 

2.2. Analysis of the deduced aa sequence of the askAB 
and asd genes 

Comparison of the deduced sequence of asparto- 
kinase, encoded by askAB of A. mediterranei, with the 
amino acid sequences of nine other aspartokinases 
derived from gram-positive and -negative bacteria 
reveals considerable homology between all asparto- 
kinases (Table 1 and Fig. 2). Aspartokinase from A 
mediterranei shares 44-74% identical residues with 
aspartokinases from other gram-positive bacteria, and 
28.43% identical residues with R colt aspartokinase III. 
High similarity is found around the amino terminus and 
a lower level of identity is also seen around aa 250-350. 
Several possible 'signature* motifs, 5 VQKXGGXS 12 , 
38 WXSAMGXTTDELXXLA 54 and 6l PXXREXDX- 
LT 69 are found at the amino terminus. The longest 
conserved motif W7 TTLGRGGSDTTAVAXAAAL 155 is 
located in the middle of the askA gene. The occurrence 
of these highly conserved elements in aspartokinases 
from various bacteria with widely different patterns of 
allpsteric regulation, suggested that their roles in struc- 
ture relate to catalysis rather than to allosteric regula- 
tion. Chen et aL (1989) found that mutation of residue 
299 of B. subtilis aspartokinase changes the allosteric 
characteristics of aspartokinase, which lies in a region 
in which the aspartokinase sequences show high diver- 
gence. Observations made on proteolytic fragments of 
R coli aspartokinase revealed that the aspartokinase 
activity resides in the first 248 residues (Veron et al., 
1985), corresponding with the homology pattern listed 
here. The earboxyl terminus region shows a lower degree 
of sequence conservation. This region is thought to be 
involved in the association of subunits. The p-subunit, 
which has been found unnecessary for aspartokinase 
activity and may be involved in maintaining the tertiary 
structure of aspartokinase enzyme complex, also shows 
lower identity (Cirillo et aL, 1994). 

The deduced amino acid sequence of A. mediterranei 
aspartate semialdehyde dehydrogenase shows a very 
high per cent identity with the sequences of the corre- 
sponding enzymes from gram-positive bacteria, with 
highly conserved regions distributed throughout the 
protein. Although we found that Asd from A mediterra- 
nei could substitute for the R coli Asd, the amino acid 
identity between this protein and Asd enzymes from 
gram-negative bacteria, including R coli 7 is much lower 
(Table I ). A few regions are conserved in all species 
(Fig. 3). In conjunction with earlier ligand experiments, 
Biellman et al. (1980) and Haziza et al. (1982) identified 
the sequence I30 FVGGNCTVS 138 as being part of the 
active site of Asd from R coli. The cysteine residue in 



this motif was thought to be alkylated during the 
catalytic reaction. Fig. 3 showed that the cysteine in this 
motif (position 160) is conserved in both gram-positive 
and -negative bacteria. 

2.5. Complementation of askAB and asd mutants of R: 
coli 

To confirm the functions of the putative askAB 
and asd genes, the -1.6kb BatnHl and ~l_8kb 
BgHI/RcoRl fragments from pC26 (Fig. 1) were sub- 
cloned into pUC18 and pUC!9, respectively, generating 
plasmids pCZ61 (containing askAB) and pC262 (con- 
taining asd). In both cases, the genes could be expressed 
from the vector lac promoter. The two plasmids were 
used to transform R coli CGSC5074 (an auxotrophic 
strain mutated in all three ask genes) and £. coli X6118 
(an auxotrophic strain with mutation in the asd gene). 
Asd complementing colonies were observed on LB agar 
containing 100 jag/ml ampicillin and 2 raM EPTG in the 
absence of diaminopimclic acid (DAP), and Ask com- 
plementation was observed on M9 medium with 
100 tig/ml ampicillin and 2 mM IPTG. The successful 
complementation results suggested that ORF1, 2 and 
ORF3 encode the askAB and asd genes, respectively. 

3. Conclusions 



1. The complete aspartokinase and aspartate semi- 
aldehyde dehydrogenase genes from rifamycin 
SV-produeing A. mediterranei were cloned, sequenced 
and expressed in R coli. The genes of askAB and asd 
encode proteins of 44.4 kDa, 18.2 kDa and 36.4 kDa, 
respectively. To our knowledge, this is the first cloned 
aspartokinase from actinomycetes. 

2. The evidence presented here indicates that the askAB 
and asd genes of A. mediterranei are organised in one 
operon, which is also the case in gram-positive 
Mycobacteria, Corynebacteria and Bacillus, but is 
different from S. akiyashiensis, in which ask and asd 
genes are well separated. 

3. A few motifs possibly involved in catalytic activity 
of aspartokinase were found around the amino termi- 
nus and middle of aspartokinase. Aspartate semialde- 
hyde dehydrogenase presented much higher 
variability between gram-positive and -negative bac- 
teria, and no obvious motif could be found on 
enzymes from all species. 
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